Gated Attention & Scaling Speedruns #2281

Calvin-Xu · 2026-01-05T04:52:04Z

Description

Addresses #2109

Implements Gated Attention per https://github.com/qiuzh20/gated_attention and sweeps to find optimal LR scaling factor.

Basically ready for a while but rerunning the 1.2B track on v5p-32 to get good hardware FLOPs data points, and we are having trouble launching v5p-32 specifically on our clusters. Putting a draft PR up to let people know this is being worked on lest effort be duplicated (Will almost did).

Checklist

You ran uv run python infra/pre-commit.py --all-files to lint/format your code
You ran 'pytest' to test your code
Delete this checklist

…calvin/gated_attention

Calvin-Xu · 2026-01-20T00:49:41Z

For some reason, not having the separate gate projection seems to be much more expensive FLOP-wise here, even after obtaining the results on v5p32. Will revert back to that and redo 2x and 2.5x sweep.

let XLA do its magic & not mess up dimension size alignment

Calvin-Xu added 18 commits December 6, 2025 03:08

rename attn_sink speedrun dir

35eac90

Add Gated Attention

e399064

Merge branch 'main' of https://github.com/marin-community/marin into …

6f51c0c

…calvin/gated_attention

fix Paloma local download

2d8b1c0

Add Gated Attention sweep results

4dab1b1

Improved Gated Attention impl + LR sweep

59ec26c

Merge branch 'main' of https://github.com/marin-community/marin into …

cb4ef2f

…calvin/gated_attention

Merge branch 'main' of https://github.com/marin-community/marin into …

66cc474

…calvin/gated_attention

Merge branch 'main' of https://github.com/marin-community/marin into …

b5758ab

…calvin/gated_attention

Merge branch 'main' of https://github.com/marin-community/marin into …

848821b

…calvin/gated_attention

Merge branch 'main' of https://github.com/marin-community/marin into …

bd9983a

…calvin/gated_attention

Merge branch 'main' of https://github.com/marin-community/marin into …

97487ef

…calvin/gated_attention

update w/ main

98bd063

Merge branch 'main' of https://github.com/marin-community/marin into …

c1b4399

…calvin/gated_attention

Merge branch 'main' of https://github.com/marin-community/marin into …

c283d2b

…calvin/gated_attention

Initial LR sweep results

25e5a94

tweak

7d38157

Merge branch 'main' of https://github.com/marin-community/marin into …

eb16388

…calvin/gated_attention

Calvin-Xu changed the title ~~Calvin/gated attention~~ Gated Attention & Scaling Speedruns Jan 5, 2026

Calvin-Xu added 7 commits January 4, 2026 20:57

precommit

03da330

Merge branch 'main' of https://github.com/marin-community/marin into …

313b9fa

…calvin/gated_attention

Merge branch 'main' of https://github.com/marin-community/marin into …

5588a9f

…calvin/gated_attention

Merge branch 'main' of https://github.com/marin-community/marin into …

f136350

…calvin/gated_attention

Merge branch 'main' of https://github.com/marin-community/marin into …

d0adcc7

…calvin/gated_attention

Check in results on v5p32

9156692

Merge branch 'main' of https://github.com/marin-community/marin into …

0cdf99e

…calvin/gated_attention

revert back to separate gate proj

7b4b128

let XLA do its magic & not mess up dimension size alignment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gated Attention & Scaling Speedruns #2281

Gated Attention & Scaling Speedruns #2281

Uh oh!

Calvin-Xu commented Jan 5, 2026

Uh oh!

Calvin-Xu commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gated Attention & Scaling Speedruns #2281

Are you sure you want to change the base?

Gated Attention & Scaling Speedruns #2281

Uh oh!

Conversation

Calvin-Xu commented Jan 5, 2026

Description

Checklist

Uh oh!

Calvin-Xu commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants